Reinforcement Learning and Shaping: Encouraging Intended Behaviors

نویسندگان

  • Adam Laud
  • Gerald DeJong
چکیده

We explore dynamic shaping to integrate our prior beliefs of the final policy into a conventional reinforcement learning system. Shaping provides a positive or negative artificial increment to the native task rewards in order to encourage or discourage behaviors. Previously, shaping functions have been static: the additional rewards do not vary with experience. But some prior knowledge cannot be expressed as static shaping. We take an explanation-based approach in which the specific shaping function emerges from initial experiences with the world. We compare no shaping, static shaping, and dynamic shaping in the task of learning bipedal-walking on a simulator. We empirically evaluate the convergence rate and final performance among these conditions while varying the accuracy of the prior knowledge. We conclude that in the appropriate context, dynamic shaping can greatly improve the learning of action policies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Empirical Study of Potential-Based Reward Shaping and Advice in Complex, Multi-Agent Systems

This paper investigates the impact of reward shaping in multi-agent reinforcement learning as a way to incorporate domain knowledge about good strategies. In theory, potentialbased reward shaping does not alter the Nash Equilibria of a stochastic game, only the exploration of the shaped agent. We demonstrate empirically the performance of reward shaping in two problem domains within the context...

متن کامل

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...

متن کامل

Embodied Evolution of Learning Ability

Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechan...

متن کامل

Potential-Based Shaping and Q-Value Initialization are Equivalent

Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for seve...

متن کامل

Analysing the Effects of Reward Shaping in Multi-Objective Stochastic Games

The majority of Multi-Agent Reinforcement Learning (MARL) implementations aim to optimise systems with respect to a single objective, despite the fact that many real world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a mean...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002